BioData Mining — Latest Matching Preprints

1

Alcohol consumption during pregnancy dysregulates maternofetal angiogenic and inflammatory factors with sex specificities

Sautreuil, C.; Lesueur, C.; Pinto Cardoso, G.; Bruel, H.; Biran, V.; Muller, J.-B.; Duigou, A.-L.; Datin-Dorriere, V.; Verspyck, E.; Marguet, F.; Laquerriere, A.; Gressens, P.; Gonzalez, B.; Marret, S.

2026-07-17 pediatrics 10.64898/2026.07.15.26357094 medRxiv

Top 0.1%

2.6%

Show abstract

Prenatal alcohol exposure (PAE) is a major cause of neurodevelopmental disorders, yet most children are diagnosed late or misdiagnosed. Neuroplacentology suggest that placental factors released into maternal and/or umbilical cord blood contribute to fetal brain development. Consistently, a preclinical inter-organ transcriptomic database revealed that PAE disrupts the expression ratio of angiogenic and inflammatory factors suggesting an angio-inflammatory response. This study aimed i) to assay, by multiplex immunoassay, angiogenic and inflammatory factors in maternal and umbilical cord blood from alcohol-consuming women and ii) to perform a maternofetal analysis according to neonatal sex. Afterwards, dysregulated factors from mothers who gave birth to females or males were submitted to STRING and ShinyGO analyses. Results showed that PAE differently altered the distribution profiles of dysregulated angiogenic and inflammatory factors in maternal and umbilical cord blood. Moreover, sex-specific differences were observed, with 36% of dysregulated proteins specific to males, 48% to females, and 16% common to both. STRING analysis revealed robust functional protein-protein interactions linking together inflammatory and angiogenic clusters while the ShinyGO analysis identified enriched pathways related to vascular shear stress. These findings provide the first maternofetal analysis of combined angiogenic and inflammatory factors from alcohol-consuming mothers.

2

The Shape of a Final Message: An Emotional Landscape in the Language of Suicide

Pestian, J. P.; Jacobson, D. A.; Pedapati, E. V.; Mendonca, E. A.; McMahon, B. H.; Ive, J.; Glauser, T. A.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.16.26358230 medRxiv

Top 0.2%

2.3%

Show abstract

The emotional content of suicide notes is typically examined using categorical coding, where each labeled passage is treated in isolation from its surrounding language. In contrast, dimensional models of psychopathology propose that affective content varies along continuous gradients. We evaluated this proposition directly. Excerpts from 884 annotated suicide notes were embedded in a semantic space defined solely by their linguistic properties, and we investigated whether human-assigned emotion labels changed smoothly across this space. They did: affective tone showed clear spatial autocorrelation (Moran's $I = 0.18$, $z = 19.68$, $p < 0.001$), an effect that replicated across three different encoders and remained after removing all within-note dependencies. Emotions occupied recognizable yet overlapping regions rather than forming distinct clusters and varied substantially in how tightly they were concentrated: love and hopelessness appeared with similar frequency, but love was far more localized ($z = 15.7$ versus $10.8$). Among all emotions, hopelessness was the most linguistically diffuse, implying that a single categorical label is capturing multiple, qualitatively different manifestations of suicidal distress.

3

Hypertension Phenotypes in a National Database: A Three-Axis State Model Integrating Diagnosis, Treatment Intensity, and Blood Pressure Control (The NDB-K7Ps-Study-8)

nakajima, K.; Sekine, A.

2026-07-19 cardiovascular medicine 10.64898/2026.07.16.26358276 medRxiv

Top 0.2%

2.1%

Show abstract

Hypertension is commonly defined as a binary condition despite substantial heterogeneity in diagnosis, treatment, and blood pressure (BP) control. We propose a three-axis state model integrating diagnosis status, treatment intensity, and BP control to better characterize hypertension phenotypes. The framework generates 27 possible states that can be condensed into seven clinically meaningful groups. We applied the model to 5,129,584 Japanese adults using the National Database of Health Insurance Claims and Specific Health Checkups. Hierarchical cluster analysis, sensitivity analysis excluding patients with cardiovascular diseases other than hypertension, and validation against antihypertensive medication use were performed. Overall, 64% of participants were classified as normotensive, whereas 36% belonged to hypertension-related groups, including 11% with unrecognized hypertension and 7% with diagnosed but untreated hypertension. Agreement with data-driven hierarchical cluster analysis was substantial (weighted {kappa}=0.87). The group distribution remained largely unchanged in the sensitivity analysis, supporting the robustness of the proposed classification. Hypertension diagnosis also showed high validity, with a sensitivity of 96.5%, specificity of 91.8%, and substantial agreement with antihypertensive medication use ({kappa}=0.78). This three-axis framework provides a robust and clinically interpretable approach for characterizing hypertension phenotypes, enabling systematic identification of care gaps and supporting research, clinical decision-making, and population health management.

4

Leveraging global PhPID framework to enable more granular signal detection and characterization in VigiBase: a dexamethasone case study.

Vasconcelos-Blomberg, P.; Felix China, J.; Syeda, B. R.; Fladvad, M.; Lagerlund, O.; Gattepaille, L. M.; Fusaroli, M.

2026-07-15 pharmacology and therapeutics 10.64898/2026.07.13.26357959 medRxiv

Top 0.4%

1.3%

Show abstract

Introduction: Conventional substance-level disproportionality analysis may miss safety patterns specific to a dose form, route, or intended site. More granular analyses are hindered by incomplete, inconsistent reporting of product information. The Pharmaceutical Product Identifier (PhPID), representing products by substance, strength, and dose form, may support more granular analyses. Objective: To explore the use of PhPID-like dose form information for site-specific disproportionality analysis in dexamethasone. Methods: We evaluated VigiBase reports (January 1, 2001 - December 31, 2024) for completeness of dose form and route data. We standardized dexamethasone entries to PhPID Level 3 standards, representing substance and administrable dose form. Through disproportionality analysis (Information Component, IC) we compared substance-level and site-specific results. Results: Among 56.4 million suspected/interacting drugs, dose form was reported in 47.7%, route in 69.4%. Among 109,248 dexamethasone entries, 703 dose form and 80 route variations were mapped to 53 and 44 standard codes respectively; about half could be mapped unambiguously. Site-specific analyses revealed biologically plausible patterns not apparent in substance-level analyses. Ocular use showed higher ICs for glaucoma and cataract, while systemic use showed higher IC for psychiatric and endocrine events (e.g., depression, agitation, Cushing's syndrome). IC time-trends suggested that some signals (e.g., cataract with Ocular use) could emerge earlier in site-specific analyses. Conclusion: More granular product information, aligned with PhPID, may improve signal detection and characterization of site-specific safety issues. These findings support granular identifiers in pharmacovigilance while highlighting the need for better capture and standardization of dose form and route of administration data.

5

Developing a Heart Failure Readmission Model From Inpatient Electronic Medical Record Data

Martin, E. A.; Lee, S.; Walker, R.; Pitka, E.; Soroush, M. Z.; Ezekowitz, J.; Howlett, J. G.; Fine, N. M.; Bakal, J. A.; Quan, H.; Eastwood, C. A.

2026-07-21 cardiovascular medicine 10.64898/2026.07.18.26358391 medRxiv

Top 1%

0.6%

Show abstract

Importance: Heart failure readmissions remain common following hospitalization, but accurately identifying which patients will be readmitted after discharge remains challenging. Improved prediction could support targeted transitional care interventions and more efficient allocation of clinical resources. Objective: In this study we attempted to improve readmission prediction after heart failure hospitalization by using variables chosen through a modified Delphi process, and using inpatient Electronic Medical Record (EMR) data, focusing on clinical notes. Design: This prognostic study developed competing risk survival models to predict readmission after heart failure hospitalization. Variables were chosen using a modified Delphi process, and extracted from EMR notes using various natural language processing techniques or from other EMR elements where appropriate. Patients were admitted between 2011 through 2019, and at least one year of follow-up was available for all patients. Models were evaluated using C-statistics, as well as sensitivity, specificity, positive and negative predictive values. Setting: During the study period, all acute-care facilities in Calgary, Alberta used the same EMR system, from which patients were selected. Participants: Patients were 18 years or older, resided in Alberta, and were admitted to a Calgary hospital. All corresponding admissions with a most responsible diagnosis of heart failure were included (n=15,160). Main Outcomes and Measures: The main outcome of interest was readmission within 30 days, though 90- and 365-day time frames were also analyzed. Death was treated as a competing risk and analysed at those time frames as well.

6

Are CNV Risk Scores Linked to Neurodevelopmental and Mental Health Characteristics Within CNV-Associated Intellectual Disability?

Chi, Z.; Alexander-Bloch, A.; Neufeld, S. A.; Wolstencroft, J.; Skuse, D.; IMAGINE-ID consortium, ; Baker, K.

2026-07-16 psychiatry and clinical psychology 10.64898/2026.07.14.26358034 medRxiv

Top 2%

0.4%

Show abstract

Background: Children and young people (CYP) with intellectual disability (ID) frequently have co-occurring neurodevelopmental (ND) and mental health (MH) difficulties. While copy number variants (CNVs) are identified as an important aetiology of ID, it is unclear whether and how CNV risk scores predict ND and MH characteristics within the CNV-associated ID population. Methods: We analysed data from the UK-based IMAGINE-ID cohort of CYP (aged 4-19 years) with ID and clinically-reported CNVs (N = 1,640). CNVs were annotated with Gencode 19 in ENSEMBL to calculate CNV risk scores, including summed probability of loss-of-function intolerance (pLI) and dosage sensitivity. Multivariate regression models examined the prediction of CNV variables and inheritance on ND and MH characteristics, assessed via the Development and Well-Being Assessment (DAWBA). Post-hoc analyses explored CNV variable stratification (lower vs. higher range pLI). Results: Higher summed pLI scores (indexing CNV genes' intolerance to loss of function) unexpectedly predicted fewer MH difficulties and a lower likelihood of ND diagnoses, even after accounting for demographic factors and CNV inheritance. Post-hoc analyses identified a threshold effect. Within the lower pLI range, higher pLI scores were associated with greater MH difficulties, consistent with findings from population-based samples. In contrast, within the higher pLI range, higher pLI scores were associated with fewer MH difficulties (among individuals more likely to have severe ID). Conclusion: These findings challenge the assumption that CNV genomic "risk scores" universally predict ND and MH difficulties. Instead, within CNV-associated ID, complex relationships exist between CNV risk scores, inheritance and phenotypes. These insights emphasise the necessity of integrating genomic results with familial and developmental context to understand individual vulnerabilities and support needs.

7

Life-Stage Heterogeneity in the Mental Health Treatment Gap: An Unsupervised Machine Learning Profiling of Symptomatic US Adults

Forday, W. L.

2026-07-15 psychiatry and clinical psychology 10.64898/2026.07.14.26358030 medRxiv

Top 2%

0.4%

Show abstract

Abstract Background Despite a rising global psychiatric burden, a treatment gap persists where the majority of symptomatic individuals remain unmedicated. Traditional epidemiological analyses treat this untreated population as a single, uniform block, obscuring specific barriers to care. This study uses an unsupervised machine learning pipeline to identify distinct socio-behavioural and biological sub-populations within the untreated cohort to guide targeted public health interventions. Methods Data were pooled from the 2015-2018 National Health and Nutrition Examination Survey (NHANES) cycles (N=11,848 total adult respondents). A symptomatic cohort of 3,075 individuals experiencing daily or weekly anxiety or depression symptoms was isolated, excluding severe liver pathology outliers ("GGT"[≥]80" U/L" ). A 22-feature matrix combining continuous clinical biomarkers (systolic blood pressure, waist circumference, HbA1c) and categorical social variables was projected using Factor Analysis of Mixed Data (FAMD). Latent sub-populations were identified via Gaussian Mixture Modelling (GMM), optimized by the Bayesian Information Criterion (BIC). Results The broad baseline population revealed a substantial mental health burden, with 30.4% reporting active psychiatric symptoms, of whom 71.6% were entirely unmedicated. The GMM pipeline successfully isolated three distinct sub-populations (k=3) separated by age, clinical strain, and treatment rates: Cluster 0 (Mature Adults, mean age 55.03): high psychiatric severity (34.1% severe untreated), central obesity, and hypertensive strain (135.82 mmHg), with 64.1% untreated despite frequent primary care contact; Cluster 1 (Working Professionals, mean age 38.38): highly educated, female-dominated (70.5%), with 77.7% untreated driven by moderate distress; Cluster 2 (Emerging Youth, mean age 18.49): a highly vulnerable late-adolescent group with a staggering 90.2% untreated rate. Conclusion The unmedicated symptomatic population is highly diverse and segmented by life stage. These profiles show that the treatment gap is driven by age-specific barriers, specifically workforce-age symptom masking and late-adolescent developmental transitions. Closing this deficit requires shifting from uniform public health approaches toward targeted interventions, such as digital peer support networks for youth and integrated primary care screenings for older adults.

8

Reconsidering the case against risk prediction in self-harm: routinely collected health data distinguishes groups at higher and lower risk of adverse outcomes following paracetamol overdose

Oxley, J.; Schölin, L.; Brennan, G.; Anand, A.; Brett, J.; Eddleston, M.; Humphries, C.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.15.26358127 medRxiv

Top 2%

0.4%

Show abstract

Background. UK clinical guidance recommends that structured risk prediction tools and risk stratification should not be used in self-harm, to predict suicide or determine who is offered treatment. Underpinning this position is the premise that routinely collected health data contain no useful predictive signal, which has received little direct scrutiny. Objective. To test whether routinely collected electronic health record data can distinguish groups at higher and lower risk of severe outcomes following paracetamol overdose. Methods. We analysed 4,095 adults presenting to NHS Lothian emergency departments with paracetamol overdose (2017-2023). Elastic-net logistic regression was fitted to 37 routinely collected electronic health record features to predict a composite of death or mental health inpatient admission at 0-7, 8-30 and 31-365 days following attendance, evaluated on a held-out 20% test set with bootstrapping. Findings. Events occurred in 5.5% of patients at 0-7 days, 2.0% at 8-30 days and 7.9% at 31-365 days, dominated by mental health admission. Bootstrap AUROC 95% confidence intervals lay above 0.5 in every window (0.65-0.82, 0.63-0.90, 0.71-0.85): models ranked patients better than chance. Calibration slopes (1.04, 1.14, 1.07) were close to one. Ranking drew primarily on mental health-related features. Conclusions. Routinely collected health data carried predictive signal for severe outcomes after paracetamol overdose, although discrimination fell short of what is needed for individual-level clinical use. Clinical implications. These models are not proposed for clinical deployment; however, treating risk prediction as a settled question will redirect research efforts, potentially excluding this patient population from machine learning advances driving improvements in care in other medical specialties.

9

Explainable, personalised prediction of emergency readmission and mortality following hospitalisation in patients with heart failure

Gallego Luxan, B.; Huberts, L.; Yu, J.; Blake, V.; Liu, L.; Jorm, L.; Ooi, S.-Y.

2026-07-17 cardiovascular medicine 10.64898/2026.07.15.26358201 medRxiv

Top 2%

0.4%

Show abstract

Background: Unplanned emergency readmissions remain common following hospitalisation for heart failure (HF). Residual congestion, atrial fibrillation, frailty, and other comorbidities contribute to adverse outcomes after discharge. Identifying patients at high risk of readmission or death may help target post-discharge management. Methods: We conducted a retrospective cohort study of patients hospitalised with HF in selected New South Wales hospitals who were discharged alive and not documented as receiving end-of-life care. Clinical, laboratory, medication, and text-derived variables extracted from electronic health records were used to develop predictive models and corresponding risk scores for emergency readmission and all-cause mortality within 180 days of discharge. Feature importance methods were used to identify key predictors and explain individual risk estimates. To illustrate model predictions while preserving patient privacy, we generated representative synthetic patient profiles by summarising the characteristics of groups of patients with similar predicted risk patterns and visualised the major contributors to their predicted risks using Shapley values. Results: The study included 5,202 hospitalisations among 3,933 patients. Within 180 days of discharge, 45.2% of patients experienced at least one emergency readmission and 12.4% died. The most common causes of emergency readmission were recurrent HF, followed by atrial fibrillation, chest pain, and pneumonia. Predictive performance was moderate for emergency readmission (AUC 0.70; calibration slope 1.30) and good for mortality (AUC 0.84; calibration slope 1.01). Emergency readmission risk was primarily associated with greater prior healthcare utilisation, a higher number of active medical problems, high risk of falls, older age, and impaired kidney function. Mortality risk was most strongly associated with abnormal red blood cell distribution width, elevated blood urea, older age, and lower systolic blood pressure. A lower number of discharge medications, particularly cardiovascular therapies, was associated with a higher risk of emergency readmission and a lower risk of mortality. Representative synthetic patient profiles demonstrated heterogeneity in the factors contributing to predicted risks, illustrating the value of patient-level risk visualisation. Conclusions: Predictive models identified clinically meaningful predictors of emergency readmission and mortality following HF hospitalisation. Patient-level visualisation of individual risk drivers may support more personalised post-discharge management.

10

FLT3-ITD signals for CEBPA and p53 proteolysis by the ubiquitin-proteosome pathway

Gu, X.; Biswas, S.; Zahran, Z. A.; Bae, S.; Balusu, R.; Jha, B. K.; Maciejewski, J. P.; Saunthararajah, Y.

2026-07-15 cancer biology 10.64898/2026.07.14.738455 medRxiv

Top 2%

0.4%

Show abstract

Internal-tandem-duplication of the receptor tyrosine kinase FLT3 (FLT3-ITD) generates ligand-independent signaling and is highly recurrent in acute myeloid leukemias (AMLs). One way signaling pathways can quickly influence cell fates is by phosphorylating key fate-determining proteins to trigger their proteolysis. We investigated the master transcription factor (MTF) driver of granulo-monocytic lineage-fates, CEBPA, for regulation by this mechanism because we found high CEBPA mRNA but little CEBPA protein in FLT3-ITD versus FLT3-wildtype AML cells, and inhibiting FLT3-ITD signaling with tyrosine kinase inhibitors (TKI) rapidly rescued CEBPA protein. Mass spectrometry analyses of CEBPA and its interactome demonstrated prominent interactions with major ubiquitin-proteosome pathway (UPP) components UHRF1 and USP7. TKI treatments decreased CEBPA and USP7 phosphorylations at serine 21 and serine 18 respectively alongside shifts in CEBPA interactions from degradative ubiquitin-ligase UHRF1 toward protective deubiquitinase USP7. The rescued CEBPA activated granulocytic-differentiation. Supporting that the serine-phosphorylations were phospho-degrons, UPP-inhibitors (bortezomib, MG132) increased phosphorylated and total CEBPA and USP7. The MTF regulator of apoptosis p53 is a known USP7 client, therefore, we also evaluated p53 status: TKIs and UPP-inhibitors stabilized USP7 and p53, triggering apoptosis in addition to granulocytic-differentiation specifically in FLT3-ITD but not FLT3-wildtype AML cells. UPP-inhibitors produced these consequences in TKI-resistant FLT3-ITD AML cells also. These data predicted genetic loss-of-function to CEBPA or TP53 is redundant in the FLT3-ITD context, borne out by mutual exclusivity of the mutations in clinical series. In summary, FLT3-ITD signals for CEBPA and p53 proteolysis to block lineage-maturation and apoptosis, positioning UPP-inhibitors as therapeutic candidates acting downstream of TKIs. KEY POINTSO_LIThe oncoprotein kinase FLT3-ITD signals for CEBPA and p53 proteolysis and hence suppresses lineage-differentiation and apoptosis C_LIO_LIProteosome-inhibitors are candidate remedies to restore CEBPA and p53, acting downstream of presently used FLT3-ITD kinase inhibitors C_LI GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=171 SRC="FIGDIR/small/738455v1_ufig1.gif" ALT="Figure 1"> View larger version (56K): org.highwire.dtl.DTLVardef@6ae211org.highwire.dtl.DTLVardef@12003bforg.highwire.dtl.DTLVardef@d62eb9org.highwire.dtl.DTLVardef@1958693_HPS_FORMAT_FIGEXP M_FIG C_FIG

11

Diversity and Utilization Patterns of Medicinal Plants Used in the Management of Diabetes Mellitus: An Ethnobotanical Study in Selected Communities in Sierra Leone

Kamara, S.; Jimmy, A. I.; Gary, L. P.

2026-07-21 pharmacology and therapeutics 10.64898/2026.07.18.26358386 medRxiv

Top 2%

0.3%

Show abstract

Background: Diabetes mellitus is an increasing public health challenge in Sierra Leone, where access to diagnosis, treatment, and long-term care remains limited. Traditional medicine continues to play a significant role in disease management; however, ethnobotanical knowledge related to diabetes remains insufficiently documented. Methods: A cross-sectional ethnobotanical survey was conducted among 40 informants, including traditional healers, herbalists, and knowledgeable community members in Waterloo, Pendembu, and Bo. Data were collected using structured questionnaires administered via Kobo Toolbox and paper-based tools. Information on medicinal plants, plant parts used, preparation methods, routes of administration, and knowledge transmission pathways was obtained. Quantitative ethnobotanical indices, including Frequency of Citation (FC), Relative Frequency of Citation (RFC), and Informant Consensus Factor (ICF), were calculated. Results: A total of 21 medicinal plant species were documented. The most frequently cited species were Moringa oleifera (FC = 9; RFC = 0.225), Vernonia amygdalina (FC = 7; RFC = 0.175), and both Cassia siberiana and Telfairia occidentalis (FC = 6; RFC = 0.150). Leaves were the most commonly utilized plant part (40.9%), and decoction was the predominant preparation method (76.2%), with oral administration accounting for 95.2% of use. The Informant Consensus Factor (ICF = 0.69) indicated a relatively high level of agreement among informants. Knowledge was primarily transmitted through apprenticeship and inherited family practices. Conclusion: Traditional medicinal plants remain an important component of diabetes management in Sierra Leone. The high level of consensus among informants and the repeated citation of specific plant species suggest structured and culturally validated therapeutic practices. The findings provide a foundation for future phytochemical and pharmacological investigations and highlight the need for documentation, preservation, and sustainable utilization of ethnobotanical knowledge.

12

Neonatal admission as a marker of risk for poor educational attainment and special educational needs in children aged 5-11 years

John, A.; Pike, C.; Olga, L.; Sovio, U.; Wong, H. S.; Smith, G. C.; Aiken, C.

2026-07-17 pediatrics 10.64898/2026.07.15.26358132 medRxiv

Top 2%

0.3%

Show abstract

Background: Children born prematurely (before 37 weeks) or admitted to the neonatal unit (NNU) are at increased risk of adverse long-term physical health outcomes. It is also recognised that there is an association with later academic performance and special educational needs, however it is not clear whether these broad risk factors could be used as stand-alone heuristics to identify children who may benefit from additional support in educational settings. We aimed to examine the associations between neonatal unit (NNU) admission and educational attainment in mid-childhood. Methods and Findings: Pregnancy data from a prospective birth cohort (Pregnancy Outcome Prediction Study, Cambridge, United Kingdom, 2008-2012) were linked to national educational outcomes (Department for Education, United Kingdom). Multivariable regression models adjusted for maternal, child, and socioeconomic factors were used to evaluate associations between (i) all NNU admissions, (ii) at term NNU admissions >48 hours, (iii) preterm birth without ongoing physical health needs, and educational outcomes at ages 5-11 years. Children who required any NNU care were more likely not to meet expected educational standards across multiple ages and domains in early and mid-childhood: age 5 early year foundation (aOR 1.64, 95% CI 1.19-2.27, p=0.003), phonics at age 6 (aOR 2.43, 95% CI 1.72-3.57, p<0.001), and at age 7 (here assessments were divided into multiple domains): reading (aOR 1.67, 95% CI 1.18-2.38, p=0.004), writing (aOR 1.72, 95% CI 1.25-2.38, p<0.001), mathematics (aOR 1.56, 95% CI 1.09-2.22, p=0.020), and science (aOR 1.85, 95% CI 1.22-2.78, p=0.003). Similar patterns were observed among both at term-born infants who stayed >48hrs in NNU (phonics assessment at age 6 aOR 2.26, 95% CI 1.51-3.36, p<0.001) and in children born preterm without long-term physical health sequelae (phonics assessment at age 6 aOR 3.07, 95% CI 1.96-4.81, p<0.001). These associations were robust to adjustment for demographic, perinatal, and socio-economic factors. By age 11, differences in academic attainment were attenuated and no longer clearly distinguishable across all exposure groups. However, there was an increased likelihood of special educational needs (SEN) at age 11 associated with any NNU admission (aOR 1.78, 95% CI 1.15-2.73, p=0.009), at term NNU admission for >48hrs (aOR 1.88, 95% CI 1.19-3.00, p=0.007), and children born preterm without long-term physical health sequelae (aOR 1.50, 95% CI 1.00-2.25, p=0.049). Predictive performance of any NNU admission for SEN at age 11 was moderate (AUC 0.70, 95% CI: 1.14-2.65, p=0.010), with balanced sensitivity and specificity and high negative predictive value. Conclusions: NNU admission, for both term and preterm infants, is associated with poorer educational outcomes and an increased likelihood of special educational needs in mid-childhood.

13

Prompt Engineering Limitations: Preliminary Evaluation of Large Language Models for Psychotherapy Safety

Ngo, N.; Dao, G.; Sano, A.

2026-07-18 psychiatry and clinical psychology 10.64898/2026.07.16.26358261 medRxiv

Top 2%

0.3%

Show abstract

Large Language Models are increasingly used in consumer-facing mental health tools, many of which claim that prompt engineering alone can ensure safe therapeutic behavior. This study evaluates that assumption by testing 20 proprietary and open-source LLMs on high-risk psychiatric scenarios, using prompts grounded in behavioral therapy principles. Prompt engineering reduced some predictable risks, such as explicit endorsement of self-harm, but consistently failed in ambiguous or clinically nuanced situations. Models frequently validated harmful statements, colluded with hallucinations, minimized symptoms, or used stigmatizing language, including in the newest and largest models. These failures reflect structural limitations such as lack of memory, insufficient contextual reasoning, and training-related biases. Prompt engineering alone is therefore insufficient for safe AI-mediated psychotherapy; clinician-guided fine-tuning, integrated safety mechanisms, and system-level oversight will be required. This work provides early evidence motivating deeper clinician-led evaluation and safety-oriented model development.

14

Analytical perturbation reveals hidden instability of biological phenotypes

Piorkowska, N. J.; Ostromecki, A.; Franik, G.; Bizon, A.

2026-07-16 endocrinology 10.64898/2026.07.13.26357916 medRxiv

Top 3%

0.3%

Show abstract

Background Unsupervised machine learning has become a cornerstone of computational phenotyping across clinical medicine, genomics, imaging, and multi-omics research. However, phenotype discovery relies on a sequence of analytical decisions - including missing-data handling, preprocessing, dimensionality reduction, clustering methodology, and stochastic initialization - that are rarely evaluated collectively. Although clustering stability has been extensively investigated, the robustness of complete analytical workflows remains largely unexplored. Results We developed an Analytical Perturbation Framework that systematically quantifies the robustness of phenotype discovery by perturbing complete unsupervised learning workflows rather than individual clustering algorithms. Using a real-world cohort of 1,286 women with polycystic ovary syndrome (PCOS), we generated 116 valid analytical pipelines comprising alternative preprocessing strategies, missing-data handling methods, dimensionality reduction approaches, clustering algorithms, and random initializations. Agreement between independently generated phenotype solutions was consistently low (median Adjusted Rand Index = 0.079), indicating substantial sensitivity of phenotype discovery to routine analytical decisions. Variance decomposition identified preprocessing as the largest contributor to phenotype instability (22.8%), followed by clustering methodology (14.6%), whereas stochastic initialization explained only 3.1% of the observed variability. At the patient level, most individuals exhibited reproducible phenotype assignments (median Patient Robustness Score = 0.719), although a substantial subgroup showed markedly lower assignment stability. Feature perturbation analyses identified follicle-stimulating hormone, anti-thyroglobulin antibodies, anti-thyroid peroxidase antibodies, total testosterone, luteinizing hormone, and androstenedione as the strongest contributors to computational robustness, rather than biological importance. Finally, phenotype solutions demonstrating greater computational robustness also exhibited greater biological coherence during independent validation.

15

Development and external validation of deep learning models for spontaneous preterm birth prediction from mid-trimester cervical ultrasound

Chanian, R.; Mishra, D.; Jain, R.; Sharma, N.; Khurana, A.; Tripathi, R.; Tripathi, A.; group, G.-I. s.; Wadhwa, N.; Noble, J. A.; Thiruvengadam, R.; Desiraju, B. K.; Bhatnagar, S.

2026-07-19 obstetrics and gynecology 10.64898/2026.07.17.26358221 medRxiv

Top 3%

0.3%

Show abstract

Preterm birth is the leading cause of neonatal death. Despite sustained efforts to identify high-risk women in the mid-trimester, accurate prediction remains difficult. Quantitative cervical ultrasound texture has been proposed as a predictor of spontaneous preterm birth. However, earlier models were developed in small single-centre samples and were not externally validated. We developed image-texture (Local Binary Patterns with a Random Forest), deep-learning (Vision Transformer), clinical-variable, and multimodal models to predict spontaneous preterm birth on the prospective GARBH-Ini cohort. We then externally validated our best models on an independent cohort scanned on a different ultrasound machine. Our best overall model reached an internal-test area under the receiver-operating-characteristic curve of 0.71 (95% CI 0.60, 0.82), but performed modestly at 0.52 (95% CI 0.38, 0.64) externally. The deep-learning and multimodal models did not perform better. Discrimination appeared higher in a clinically high-risk subgroup at the 34-week threshold. These estimates were imprecise because of few cases and need to be confirmed in future studies. Among the several likely reasons for the modest external performance is the heterogeneity of preterm birth. Predicting distinct preterm-birth subtypes separately, and integrating additional biomarkers and data domains, might improve model performance. Keywords: preterm birth; cervical ultrasound; prediction model; external validation; deep learning

16

Transient Apical Sparing in Hypertensive Heart Disease Explained by Laplace's Law

Hwang, I.-C.; Kim, H. M.; Jang, Y.; Bak, M.; Park, J.; Jeon, J.; Lee, S.-A.; Choi, H.-M.; Yoon, Y. E.; Cho, G.-Y.

2026-07-19 cardiovascular medicine 10.64898/2026.07.16.26358114 medRxiv

Top 3%

0.3%

Show abstract

Background: Apical sparing of left ventricular longitudinal strain (LS) is an echocardiographic clue to cardiac amyloidosis but may also occur in hypertensive heart disease (HHD). Objectives: To determine whether apical sparing in HHD is associated with regional left ventricular wall stress estimated according to Laplace's law. Methods: We retrospectively studied 1,559 patients with HHD, 47 with light-chain cardiac amyloidosis (ALCA), and 409 normotensive controls. Artificial intelligence-assisted echocardiography quantified segmental LS, wall thickness, and cavity radius at the basal, midventricular, and apical levels. Wall stress was estimated as mean blood pressure (MBP) x radius/(2 x wall thickness). Apical sparing was defined as a relative regional strain ratio (RRSR)[≥]1.0. Results: Apical sparing was present in 14 patients with HHD (0.9%), 13 with ALCA (27.7%), and no controls. Among HHD patients with apical sparing, RRSR decreased from 1.11{+/-}0.13 to 0.72{+/-}0.10 after antihypertensive treatment (P<0.001), accompanied by reduced wall stress and improved basal and midventricular LS, with resolution of apical sparing in all 14 patients. In the overall HHD cohort, changes in MBP and left ventricular mass index were independently associated with changes in RRSR. In an exploratory analysis of HHD patients with apical sparing, a reduction in basal wall stress was associated with a reduction in RRSR ({beta}=0.267 for {bigtriangleup}RRSRx100, 95% CI 0.023-0.511; P=0.036). In ALCA, favorable hematologic response was the only determinant of RRSR reduction. Conclusions: Apical sparing in HHD was uncommon but reversible and may represent a load-sensitive deformation pattern associated with regional wall stress, consistent with Laplace's law.

17

Composite Artificial Intelligence-Enabled Electrocardiogram for Detection and Prediction of Structural Heart Disease

Lee, H. S.; Kang, S.; Lee, M. S.; Pandey, A.; Kim, M.; Jang, J.-H.; Jo, Y.-Y.; Lim, J.; Son, J. M.; Kim, K. S.; Kwon, J.-m.; Lee, S.-P.; Kim, K.-H.

2026-07-21 cardiovascular medicine 10.64898/2026.07.21.26358539 medRxiv

Top 3%

0.3%

Show abstract

Background Structural heart disease (SHD) drives heart failure and cardiovascular mortality but remains underdiagnosed, and echocardiography is limited as a population-level screening tool. Objectives We evaluated whether a composite artificial intelligence-enabled electrocardiogram (AI-ECG), combining independently developed models for left ventricular systolic (LVSD) and diastolic dysfunction (LVDD), identifies prevalent and predicts incident SHD across diverse populations. Methods In this multinational cohort study, detection was assessed cross-sectionally in a Korean clinical cohort (Incheon Sejong Hospital) and a US dataset (Columbia University Irving Medical Center), and incident risk was assessed in the Korean cohort and the UK Biobank among individuals without baseline SHD or heart failure. Adults with paired ECG and echocardiography were analyzed for detection, with the composite defined as positive on either model. SHD comprised reduced left ventricular ejection fraction, moderate or severe valvular disease, left ventricular hypertrophy, or pulmonary hypertension. Detection was assessed by sensitivity and specificity, and incident risk by Cox models and the C statistic. Results Among 46,082 and 36,286 participants in the two detection cohorts, the composite detected SHD with sensitivity of 71.8% and 76.1% and specificity of 88.3% and 70.1%, with positivity across all phenotypes. Among at-risk individuals, composite positivity was associated with incident SHD (hazard ratios, 3.75 and 2.75), with C statistics of 0.69 to 0.78. Conclusions A composite AI-ECG identified prevalent and predicted incident SHD across multinational cohorts, capturing signals beyond its training targets and supporting its potential as a scalable cardiovascular screening tool; whether ECG-based risk stratification improves outcomes requires prospective evaluation.

18

From Real-World Data to Virtual Intervention: A Probabilistic Neural Network for Simulating Kidney Function Preservation via Proteinuria Reduction

Takeda, A.; Igata, H.; Mizuno, K.; Yano, Y.; Nagasu, H.; Ohashi, M.; Kashihara, N.; Kobayashi, H.

2026-07-15 nephrology 10.64898/2026.07.12.26357786 medRxiv

Top 3%

0.2%

Show abstract

Predicting the long-term kidney function decline is critical for timely intervention but remains challenging. While the urinary protein-to-creatinine ratio (uPCR) is a potential surrogate endpoint, its short-term reduction's link to long-term nephroprotection requires investigation. This study aimed to develop a probabilistic neural network model to capture both the estimated glomerular filtration rate (eGFR) slope and its uncertainty based on baseline clinical characteristics. Using a retrospective dataset, we designed a neural network to output a predictive distribution (mean and standard deviation {sigma}) for the eGFR slope. SHAP (SHapley Additive exPlanations) was used for model interpretation, and a simulation study quantified the impact of uPCR reduction. In the validation set, the model achieved a Pearson's correlation coefficient of 0.56 and an RMSE of 2.81 ml/min/1.73m^2/year between predicted and actual slopes. SHAP analysis identified uPCR as the most potent predictor, with higher baseline levels associated with a more rapid eGFR decline. Furthermore, a simulated 62% uPCR reduction demonstrated a significant improvement in the predicted eGFR slope, an effect most pronounced in patients with high baseline uPCR. This proof-of-concept study reinforces the critical role of uPCR in predicting eGFR slope and suggests its reduction may contribute to long-term kidney function preservation, warranting validation in larger, diverse real-world datasets.

19

Critically Ill Children Frequently Receive Medications with Established but Unused Pharmacogenomic Guidelines: Actionable Findings from an Integrated Electronic Medical Record and Exome Sequencing Study

Lynch, N.; Elefant, N.; Revah-Politi, A.; Geneslaw, A. S.; Beckett, J.; Wall, J. B.; Aguilar Breton, C.; Sabatello, M.; Kernie, S. G.; Bayir, H.; Gharavi, A. G.; Motelow, J. E.

2026-07-20 genetic and genomic medicine 10.64898/2026.07.16.26358240 medRxiv

Top 3%

0.2%

Show abstract

Importance Pharmacogenomic (PGx) guidelines can improve medication efficacy and reduce toxicity, but their application in pediatric intensive care units (PICUs) remains largely unexplored. Objective To determine the frequency of medications with established PGx guidelines administered in the PICU and assess the capacity of exome sequencing to capture PGx phenotypes for these medications. Design Retrospective cohort study integrating electronic medical record and exome sequencing data. Setting Morgan Stanley Children's Hospital of NewYork-Presbyterian, a single center tertiary care children's hospital. Participants A total of 4,939 children admitted to the PICU (2020 - 2024), and 192 children admitted to the PICU who underwent exome sequencing for research purposes (2015 - 2023). Exposure Critical illness requiring PICU admission. Main Outcomes and Measures Frequencies of administration of medications with established PGx guidelines in the PICU and the proportion of individuals with exome sequencing with identifiable PGx phenotypes. Results Among 4,939 PICU patients, 37.2% (n=1,837) received at least one medication with established PGx guidelines and 14.4% (n=712) received two or more such medications. Twenty PGx genes were implicated; CYP2C9 was most common (17.3%, n=853). An estimated 8.2% of patients received medications for which PGx-guided recommendations would have altered clinical management. Among 192 patients who underwent exome sequencing, at least one metabolizer phenotype was identified in 62% (n=119). Conclusions and Relevance Many critically ill children receive medications with established PGx guidelines. This study highlights an opportunity for more personalized medicine for critically ill children admitted to a tertiary care hospital and assesses the strengths and weaknesses of exome sequencing to uncover pertinent PGx phenotypes.

20

Personality Traits, Trust, and Acceptance of Artificial Intelligence Assistive Systems: Evidence from Nigeria Population

Onah, C.; Ogwuche, C. H.; Haruna, A. I.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.16.26358233 medRxiv

Top 3%

0.2%

Show abstract

The increasing deployment of artificial intelligence (AI) assistive systems across healthcare, education, and organisational domains necessitates a deeper understanding of dispositional factors shaping trust and acceptance. This study investigated the Big Five personality traits as predictors of trust in and acceptance of AI assistive systems among a large adult sample (N = 380) in Makurdi Benue State. Anchored in the Technology Acceptance Model (TAM) developed by Davis (1989), the study examined both direct and indirect pathways linking personality traits to AI acceptance through trust. Participants completed standardised measures of the Big Five Inventory, Trust in AI Scale, and AI Acceptance Scale. Data were analysed using structural equation modelling (SEM) with maximum likelihood estimation. The hypothesised model demonstrated good fit indices (CFI = .84, TLI = .82, RMSEA = .05). Openness to experience ({beta} = .34, p < .001) and agreeableness ({beta} = .27, p < .01) significantly predicted trust in AI systems, which in turn strongly predicted AI acceptance ({beta} = .62, p < .001). Neuroticism negatively predicted trust ({beta} = -.29, p < .001), while conscientiousness showed a modest positive direct effect on acceptance ({beta} = .18, p < .05). Extraversion was not a significant direct predictor but exerted an indirect effect through trust. Mediation analysis confirmed that trust significantly mediated the relationship between personality traits and AI acceptance. The findings underscore the centrality of dispositional traits in shaping technological trust formation and highlight the psychological architecture underlying human AI interaction. These results contribute to social psychological theory and provide empirical guidance for designing personality sensitive AI systems to enhance user adoption and sustained engagement.